IBIS Macromodel Task Group

Meeting date: 11 September 2018

Members (asterisk for those attending):
ANSYS:                        Dan Dvorscak
                            * Curtis Clark
Cadence Design Systems:     * Ambrish Varma
                              Brad Brim
                              Kumar Keshavan
                              Ken Willis
eASIC:                        David Banas
GlobalFoundries:              Steve Parker
IBM                           Luis Armenta
                              Trevor Timpane
Intel:                        Michael Mirmak
Keysight Technologies:      * Fangyi Rao
                              Radek Biernacki
                              Ming Yan
                              Stephen Slater
Mentor, A Siemens Business:   John Angulo
                            * Arpad Muranyi
Micron Technology:          * Randy Wolff
                            * Justin Butterfield
SiSoft:                     * Walter Katz
                            * Mike LaBonte
SPISim:                     * Wei-hsing Huang
Synopsys:                     Rita Horner
                              Kevin Li
Teraspeed Consulting Group:   Scott McMorrow
Teraspeed Labs:             * Bob Ross

The meeting was led by Arpad Muranyi.  Curtis Clark took the minutes.

--------------------------------------------------------------------------------
Opens:
- None.

-------------
Review of ARs:

- Randy to investigate if/why/how a clock waveform input might be used.
  - In progress.

- Michael M. to investigate if/why/how a clock waveform input might be used.
  - In progress.

--------------------------
Call for patent disclosure:

- None.

-------------------------
Review of Meeting Minutes:

Arpad asked for any comments or corrections to the minutes of the September 04
meeting.  Walter moved to approve the minutes.  Ambrish seconded the motion.
There were no objections.

-------------
New Discussion:

Vref and DDR5 improvements:

Arpad suggested we continue the previous week's discussion, and briefly shared
the previous week's minutes for an overview.  Arpad noted particular interest
in two of Ambrish's comments:
1.  How much are we losing by keeping things simple and using a differential
waveform, and what would we gain by going to single-ended?
2.  Ambrish had noted their interest in writing a common clock BIRD, and noted
that they felt this was the most important topic to address.

Walter noted that he had sent an email attempting to summarize the state of the
various DDR5 issues and asked to review it.

(Text of email "Summary of DDR5 issues" sent to the ATM group)
1. Asymmetric rise and fall times of a single ended channel.
   a. Both Cadence and SiSoft believe that this can be done by the EDA tool
      without any changes to the standard.
   b. Keysight believes that the standard is incomplete because:
      i. It does not define how to generate the Impulse Response input to
         to AMI_Init
     ii. It does not define how to generate the waveform input to the Rx
         AMI_GetWave (Fangyi noted this is their primary issue)
    iii. Or the AMI methodology is invalid for single ended DDR5 DQ channels.
2. Adding DC Offset or replace the Impulse Response Input to AMI_Init with a
   Step Response
   a. Both are equivalent since a step response can be derived from an Impulse
      Response and DC Offset and vice-versa.
   b. In any case, one of these needs to be done.
   c. Impact of Tx equalization on the DC offset.  (item requested by Fangyi)
3. VrefDQ
   a. The physical memory DDR5 buffer has a register that must be set by the
      controller to define the VrefDQ in the chip.
   b. This will be very close to the DC Offset defined above, but not
      necessarily so.
   c. Need to define how an EDA tool handles the impairment caused by the VrefDQ
      register resolution, and because a single VrefDQ register may control
      several DQ channels with slightly different DC Offsets.
4. Clock Ticks
   a. The DQS to DQ skew in the DDR5 memory receiver is defined by the
      Controller. This skew is determined by simulation, or by a hardware
      training algorithm.
   b. One way to handle this is to put a CDR in the memory DQ Rx and assume that
      this CDR will find, use and report the optimal DQS/DQ phase.
   c. A possible useful reserved parameter is the DQS/DQ interconnect skew.
   d. Another way is to have the Controller Tx AMI Model generate clock ticks
      that the Memory Rx AMI Model reads and uses. A BIRD 147 protocol can be
      defined between the Tx and Rx to optimize this skew (and the Rx DFE taps
      as well).
5. Component Based AMI Simulations
   a. Both Cadence and SiSoft believe that this should be dealt with by the EDA
      tool. It knows the DQS/DQ interconnect skew for each DQ in a “Component”,
      and therefor can determine the required skew training parameters or the
      impairment added to the nibble. Note that a component in this context can
      be a single memory chip or multiple memory chips in a module. Similarly,
      the EDA tool knows the Vcent for each DQ channel and can calculate the
      ideal VrefDQ for the module and the impairment. There is little or no
      difference between DDR4 and DDR5 in this regard.
   b. Keysight believes that IBIS AMI needs to be enhanced (or a new
      methodology) to deal with Component Level AMI Simulations for DDR5.
6. Power Aware Simulations (Arpad raised the subject.  Walter noted he had left
   it off the list originally because it received little attention in the straw
   poll).
   
In depth discussion of items:

1. Asymmetric rise fall - Walter asked if the summary accurately captured
   people's statements.  Fangyi agreed, and noted that 1.ii. was the primary
   issue.
   
2. DC Offset - Fangyi asked if this item contradicted the assertion in 1.a.
   that no change to the standard was needed.  Walter noted that needing one
   more reserved parameter to pass in the offset was not at the level of
   "changing the standard."  Fangyi asked for item 2.c. to be added.
   
3. VrefDQ - Walter recapped and noted that VrefDQ is not the same as DC Offset.
   DC offset is the midpoint between the ends of the step response, which is
   a simulation result, where VrefDQ has to do with a register value set in
   memory.  There are issues of VrefDQ resolution and how that differs from
   DC Offset.  All of this is independent of Vcent, which is an independent
   issue related to an eye measurement of all the bits in a nibble.  No one
   disagreed that these were issues to be dealt with.

4. Clock Ticks - Walter noted that DQS to DQ skew in DDR memory is defined by
   the controller.  The skew setting is determined by simulation or by a
   hardware training algorithm.  So, for writes, where the memory is the Rx, the
   clock is fed to the memory and the skew between clock and data is set by the
   controller.  One way to handle this in simulation is to put a CDR in the
   memory (DQ Rx model) and assume that the CDR will find, use, and report the
   optimal DQS to DQ phase at the memory.  A Reserved parameter to define the
   DQS to DQ skew is one possible solution.  Another way would be for the
   controller Tx to generate clock ticks that the memory Rx uses.

   Ambrish noted that in their solution they don't use a CDR in their Rx model,
   they use the strobe signal to generate the clock information.  Walter asked
   how the phase between the DQ and DQS was defined.  Ambrish said that if the
   DQ and DQS waveforms were generated concurrently, then there was no need for
   the phase to be calculated at the Rx.  Fangyi noted that there were two
   related issues at play.  The controller adjusts the phase difference between
   DQ and DQS.  That is a single value that is determined during training and
   persists until the next training.  Randy noted that this skew was per DQ,
   i.e., each DQ has its own skew.  Fangyi agreed.  Fangyi noted that the second
   issue is that the clock transition used by the DRAM DFE suffers from jitter
   in the DQS signal.  So, there are two issues.  Fangyi said once the single
   fixed skew is determined (training mode), this could be passed to the Rx,
   but the Rx will still need to recover clock ticks from the DQS signal.
   Walter noted that the standard doesn't currently allow for the DQS signal to
   be passed into the model.  Ambrish said the EDA tool could determine the
   clock ticks from the DQS signal and pass these into the Rx model in the same
   memory (clock_times argument to GetWave()) currently used by the Rx model to
   return clock ticks.  Walter noted that this too is not currently allowed in
   the standard.  Walter noted that the point of this exercise is to simply
   agree on the issues, not necessarily the solutions.

Review of Walter's views on the BIRDs required to accomplish everything.

1.  Define a new parameter DC_offset that represents the mid-point of the
    start-to-finish range of the step response.

2.  Cadence and SiSoft don't think a BIRD is required to address asymmetric rise
    and fall rates.  Walter noted that they may need to convince users and DDR5
    model makers that the solutions they've implemented are sufficiently
    accurate, but there is no need to modify the spec.  Fangyi again objected to
    this.  He said we can't just say the tool can do whatever it wants.  Walter
    noted that an EDA vendor could demonstrate that their method gets results
    that are very close to a full SPICE simulation by doing some fancy
    convolutions.  They could document that method and its accuracy, and then
    we have a solution.  Fangyi asked why we need a standard at all in that
    case.  Ambrish and Walter said the standard tells you that you need a
    waveform into Rx GetWave().  It doesn't tell you how to generate that
    waveform.  The flow that is given in the standard happens to be valid for
    differential signaling and has issues with single-ended.  We could write a
    BIRD to define the way to do it for DDR5, or we can say Cadence, SiSoft,
    etc., each decide to do it their own way.  Fangyi said this was analogous to
    writing a bsim model only to find out that one tool doesn't apply the same
    fundamental physics to the bsim model that others do.  In that case a bsim
    model would be useless.  Walter said this could be a future discussion
    point.

3.  Walter noted that he doesn't think we need a VrefDQ parameter.  It becomes
    a voltage impairment that can be rolled into Rx_Receiver_Sensitivity.  Since 
    the model is told what the DC Offset is, it can determine its VrefDQ
    granularity impairment.  We may decide to define a Reserved parameter for
    VrefDQ, but it's not strictly necessary.

4.  DQS to DQ skew - Walter noted that the controller generates the DQ and DQS
    signals so the controller model could generate the clock ticks.  Ambrish
    disagreed and said the tool generates the DQS waveform, passes it through
    the DQS channel, captures it at the Rx and gives clock ticks to the DQ Rx.
    Walter asked what phase is used when the EDA tool gives the DQS waveform to
    the DQ Rx.  Fangyi said the phase that resulted from the training would
    be built into the DQS waveform.  Ambrish noted 90 degrees out of phase with
    the DQ, for example.  Walter asked how the clock tick array is passed into
    the Rx model by the EDA tool.  Ambrish noted the clock_times array would be
    used as an input.  Walter said a BIRD would be needed for that approach.
    Ambrish agreed and said it's one of the two things his group feels is
    necessary to address.

    Fangyi noted another solution was to have the tool simply generate the DQS
    waveform, and then the DQ Rx model gets both the data DQ signal waveform and
    the clock DQS signal waveform.  Then let the Rx model recover the clock
    ticks.  Fangyi noted that you might have 8 DQs and the one DQS.  Randy noted
    that as a model maker he wouldn't write a model with the complexity to
    have the entire DQ and data clock chain with all 8 DQs tied together.  That
    was more of a component level of modeling.  Randy also noted that if the
    clock ticks were to be passed into the DQ model as Ambrish proposed, then we
    really have to understand the phase difference that Walter had mentioned.
    Randy noted that the models usually assume the training has put the strobe
    in the best location for the DQ sampling.  If the EDA tool were to simply
    pass in a strobe with an ideal 90-degree offset, then that might not provide
    the right answers.  He said we needed to consider the difference between the
    EDA tool passing in clock ticks, perhaps with some jitter applied to get the
    right characteristics, or the DQ model employing some type of fake CDR
    algorithm to identify the best timing location and then utilizing existing
    Reserved Parameters for AMI models with CDRs to define the strobe jitter.

    Fangyi asked if the Rx model should just take in the full DQS waveform as an
    input and figure out where to clock the DQ, rather than relying on the tool
    to generate clock ticks.  Randy said that requirement would put a lot of
    extra burden on the model maker.  Fangyi said that if you leave it up to the
    tool then you don't know what the tool is going to do.  If you're going to
    consider putting a CDR in your Rx model instead, why not just pass in the
    DQS waveform into the model instead.  Arpad said there were two independent
    questions.  Whether you send only the DQ (fake CDR) or DQS signal signal
    into the model, or the EDA tool determines the clock ticks, the correct
    phase still has to be determined.

- Walter: Motion to adjourn.
- Randy: Second.
- Arpad: Thank you all for joining.

-------------
Next meeting: 18 September 2018 12:00pm PT
-------------

IBIS Interconnect SPICE Wish List:

1) Simulator directives